Goto

Collaborating Authors

 canonical orientation


LearningtoOrientSurfaces bySelf-supervisedSphericalCNNs

Neural Information Processing Systems

This task is commonly addressed by handcrafted algorithms exploiting geometric cues deemed as distinctive and robust by the designer. Yet, one might conjecture that humans learn the notion oftheinherent orientation of3Dobjectsfromexperience andthatmachines may do so alike. In this work, we show the feasibility of learning a robust canonical orientation for surfaces represented as point clouds.



Learning to Orient Surfaces by Self-supervised Spherical CNNs

Neural Information Processing Systems

Defining and reliably finding a canonical orientation for 3D surfaces is key to many Computer Vision and Robotics applications. This task is commonly addressed by handcrafted algorithms exploiting geometric cues deemed as distinctive and robust by the designer. Yet, one might conjecture that humans learn the notion of the inherent orientation of 3D objects from experience and that machines may do so alike. In this work, we show the feasibility of learning a robust canonical orientation for surfaces represented as point clouds. Based on the observation that the quintessential property of a canonical orientation is equivariance to 3D rotations, we propose to employ Spherical CNNs, a recently introduced machinery that can learn equivariant representations defined on the Special Ortoghonal group SO(3). Specifically, spherical correlations compute feature maps whose elements define 3D rotations. Our method learns such feature maps from raw data by a self-supervised training procedure and robustly selects a rotation to transform the input point cloud into a learned canonical orientation. Thereby, we realize the first end-to-end learning approach to define and extract the canonical orientation of 3D shapes, which we aptly dub Compass. Experiments on several public datasets prove its effectiveness at orienting local surface patches as well as whole objects.


Eq.Bot: Enhance Robotic Manipulation Learning via Group Equivariant Canonicalization

Deng, Jian, Wang, Yuandong, Zhu, Yangfu, Feng, Tao, Wo, Tianyu, Shao, Zhenzhou

arXiv.org Artificial Intelligence

Robotic manipulation systems are increasingly deployed across diverse domains. Y et existing multi-modal learning frameworks lack inherent guarantees of geometric consistency, struggling to handle spatial transformations such as rotations and translations. While recent works attempt to introduce equivariance through bespoke architectural modifications, these methods suffer from high implementation complexity, computational cost, and poor portability. Inspired by human cognitive processes in spatial reasoning, we propose Eq.Bot, a universal canonicalization framework grounded in SE(2) group eq uivariant theory for robot ic manipulation learning. Our framework transforms observations into a canonical space, applies an existing policy, and maps the resulting actions back to the original space. As a model-agnostic solution, Eq.Bot aims to endow models with spatial equivariance without requiring architectural modifications. Extensive experiments demonstrate the superiority of Eq.Bot under both CNN-based (e.g., CLI-Port) and Transformer-based (e.g., OpenVLA-OFT) architectures over existing methods on various robotic manipulation tasks, where the most significant improvement can reach 50.0%.




we propose a more general framework that can also be adopted to orient whole objects and perform rotation-invariant

Neural Information Processing Systems

Moreover, as recently shown in Bai et al . in "D3Feat: Joint Learning of Dense Detection We will add this information to the revised version. As suggested, we will use only the term orientation . We will modify it in the final version of the paper. We agree that the domain of Spherical CNNs feature maps is key and we will better highlight it in the final version. Since we seek for one rotation, the loss function in (6) is applied once, and only to the last layer of the network.


Should We Learn Contact-Rich Manipulation Policies from Sampling-Based Planners?

Zhu, Huaijiang, Zhao, Tong, Ni, Xinpei, Wang, Jiuguang, Fang, Kuan, Righetti, Ludovic, Pang, Tao

arXiv.org Artificial Intelligence

The tremendous success of behavior cloning (BC) in robotic manipulation has been largely confined to tasks where demonstrations can be effectively collected through human teleoperation. However, demonstrations for contact-rich manipulation tasks that require complex coordination of multiple contacts are difficult to collect due to the limitations of current teleoperation interfaces. We investigate how to leverage model-based planning and optimization to generate training data for contact-rich dexterous manipulation tasks. Our analysis reveals that popular sampling-based planners like rapidly exploring random tree (RRT), while efficient for motion planning, produce demonstrations with unfavorably high entropy. This motivates modifications to our data generation pipeline that prioritizes demonstration consistency while maintaining solution diversity. Combined with a diffusion-based goal-conditioned BC approach, our method enables effective policy learning and zero-shot transfer to hardware for two challenging contact-rich manipulation tasks.


Learning to Orient Surfaces by Self-supervised Spherical CNNs

Neural Information Processing Systems

Defining and reliably finding a canonical orientation for 3D surfaces is key to many Computer Vision and Robotics applications. This task is commonly addressed by handcrafted algorithms exploiting geometric cues deemed as distinctive and robust by the designer. Yet, one might conjecture that humans learn the notion of the inherent orientation of 3D objects from experience and that machines may do so alike. In this work, we show the feasibility of learning a robust canonical orientation for surfaces represented as point clouds. Based on the observation that the quintessential property of a canonical orientation is equivariance to 3D rotations, we propose to employ Spherical CNNs, a recently introduced machinery that can learn equivariant representations defined on the Special Ortoghonal group SO(3).


Orient Anything

Scarvelis, Christopher, Benhaim, David, Zhang, Paul

arXiv.org Artificial Intelligence

Orientation estimation is a fundamental task in 3D shape analysis which consists of estimating a shape's orientation axes: its side-, up-, and front-axes. Using this data, one can rotate a shape into canonical orientation, where its orientation axes are aligned with the coordinate axes. Developing an orientation algorithm that reliably estimates complete orientations of general shapes remains an open problem. We introduce a two-stage orientation pipeline that achieves state of the art performance on up-axis estimation and further demonstrate its efficacy on fullorientation estimation, where one seeks all three orientation axes. Unlike previous work, we train and evaluate our method on all of Shapenet rather than a subset of classes. We motivate our engineering contributions by theory describing fundamental obstacles to orientation estimation for rotationally-symmetric shapes, and show how our method avoids these obstacles. Orientation estimation is a fundamental task in 3D shape analysis which consists of estimating a shape's orientation axes: its side-, up-, and front-axes. Using this data, one can rotate a shape into canonical orientation, in which the shape's orientation axes are aligned with the coordinate axes.